PDLK: Plagiarism detection using linguistic knowledge

نویسندگان

  • Asad Abdi
  • Norisma Idris
  • Rasim M. Alguliyev
  • Ramiz M. Aliguliyev
چکیده

Plagiarism is described as the reuse of someone else’s previous ideas, work or even words without sufficient attribution to the source. This paper presents a method to detect external plagiarism using the integration of semantic relations between words and their syntactic composition. The problem with the available methods is that they fail to capture the meaning in comparison between a source document sentence and a suspicious document sentence, when two sentences have same surface text (the words are the same) or they are a paraphrase of each other. Therefore it causes inaccurate or unnecessary matching results. However, this method can improve the performance of plagiarism detection because it is able to avoid selecting the source text sentence whose similarity with suspicious text sentence is high but its meaning is different. It is executed by computing the semantic and syntactic similarity of the sentence-to-sentence. Besides, the proposed method expands the words in sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed in a different wording. This method is also capable to identify various kinds of plagiarism such as the exact copied text, paraphrasing, transformation of sentences and changing of word structure in the sentences. As a result, the experimental results have displayed that the proposed method is able to improve the performance compared with the participating systems in PAN-PC11. The experimental results also displayed that the proposed method demonstrates better performance as compared to other existing techniques on PAN-PC-10 and PAN-PC-11 datasets. © 2015 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-Persian Plagiarism Detection based on a Semantic Approach

Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...

متن کامل

Automated Plagiarism Detection System for Malayalam Text Documents

In this paper, a plagiarism detection tool for plagiarism detection in Malayalam documents is presented. Many language-sensitive tools for detecting plagiarism in natural language documents have been developed, particularly for English. Detecting plagiarism in Malayalam documents is particularly a challenging task because of the complex linguistic structure of Malayalam. The plagiarism detectio...

متن کامل

Automated Plagiarism Detection System for Malayalam Text Documents

In this paper, a plagiarism detection tool for plagiarism detection in Malayalam documents is presented. Many language-sensitive tools for detecting plagiarism in natural language documents have been developed, particularly for English. Detecting plagiarism in Malayalam documents is particularly a challenging task because of the complex linguistic structure of Malayalam. The plagiarism detectio...

متن کامل

Automated Plagiarism Detection System for Malayalam Text Documents

In this paper, a plagiarism detection tool for plagiarism detection in Malayalam documents is presented. Many language-sensitive tools for detecting plagiarism in natural language documents have been developed, particularly for English. Detecting plagiarism in Malayalam documents is particularly a challenging task because of the complex linguistic structure of Malayalam. The plagiarism detectio...

متن کامل

Automated Plagiarism Detection System for Malayalam Text Documents

In this paper, a plagiarism detection tool for plagiarism detection in Malayalam documents is presented. Many language-sensitive tools for detecting plagiarism in natural language documents have been developed, particularly for English. Detecting plagiarism in Malayalam documents is particularly a challenging task because of the complex linguistic structure of Malayalam. The plagiarism detectio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2015